Courses take place from 9.15-12 and from 13.15-16 and in in room 136 if not otherwise mentioned
For group work
Tuesday 28th: HAL224A, HAL228, HAL302, HAL303, E42, E43
Monday 3rd: HAL224, HAL224a, HAL228, HAL302, HAL303, E42
Tuesday 4th: HAL228, HAL302, HAL303, E42, E43, E45; NM:HAL 218
Here you find a list of class members and contact information and groups.
Dear students,
A warm welcome to the module Data skills for social work professionals!
As the first Monday of our class falls away (Pfingsten), we would like you to complete a few preparation tasks before the first meeting on Tuesday 21st.
Enroll on the moodle page (https://moodle.bfh.ch/course/view.php?id=37097) with the following key: FS24-bsc.
It is essential that you have R and R-Studio installed and running on your computer before the first classroom session. Please follow the instructions in the “Installation of R and R-Studio” guide (https://drive.switch.ch/index.php/s/ktNsnWxwkJ3olWG ), and if necessary, refer to the linked instructions on YouTube. If you have any questions, please feel free to contact us via email.
Familiarize yourself with R. We want you to take the opportunity of new AI tools and ask Copilot to take you through a tour in R (https://www.bing.com/chat?q=Microsoft+Copilot&FORM=hpcodx). Instruct Copilot on its task with the text below.
Finally, we invite you to familiarize yourself with the topic of “Data Science” and its application in social work. Create a forum post https://moodle.bfh.ch/mod/forum/view.php?id=2165224 , where you provide a concrete example of how data science can help improve the effectiveness of social work or promote the well-being of clients. What are the potential benefits and challenges of applying data science in this field? We look forward to reading your perspectives and ideas on this topic.
We wish you a successful preparation period and look forward to meeting you in person soon. Please let us know should you have any questions.
Kind regards
Dorian Kessler, Samin Sepahniya
Text to enter into Co-Pilot ein (Microsoft Copilot in Bing; important: verwenden Sie den Unterhaltungsstil «im höheren Masse kreativ/creative mode» (Schaltfläche in der Mitte des Bildschirms)):
Als Studierende(r) der Sozialen Arbeit möchte ich die Grundlagen der Programmiersprache R lernen, um statistische Datenanalysen für Projekte in der Sozialen Arbeit durchführen zu können. Ich habe keine Vorkenntnisse in Statistik oder Programmierung. Kannst du mir bitte eine schrittweise Einführung geben? Bitte beginne mit der Frage ob ich R und Rstudio installiert habe und wenn nein, unterstütze mich bei der Installation von R und RStudio. Zeige mir dann die grundlegenden Befehle und Funktionen von R. Ich würde ich gerne lernen, wie man einfache Datenanalysen durchführt (mit dplyr), Daten visualisiert (mit ggplot2) und Ergebnisse interpretiert. Folgende Dinge sind zu beachten:
Wähle ein schrittweises Vorgehen. Erzähle mir erst von dem nächsten Schritt, wenn ein Schritt abgeschlossen ist. Frage nach jedem Schritt nach, ob ich diesen erfolgreich abschliessen konnte, um sicherzustellen, dass ich alles richtig gemacht habe.
Sage mir als ersten Schritt genau wie ich mich visuell in RStudio orientieren kann und wo ich Eingaben machen muss. Wo befindet sich die Konsole/Skript/Datenübersicht/Dateienübersicht in RStudio?
Erkläre mir, was die Konsole ist und was ein R-Skript ist, wie man ein R-Skript erstellt und abspeichert und was der Zweck von Skripten ist. Arbeite mit mir mit einem R-Skript und sage mir, wie ich Befehle ausführen kann.
Bitte führe mich durch praktische Übungen und gebe mir Aufgaben, um das Gelernte zu festigen.
Biete mir Unterstützung bei Unklarheiten.
Arbeite mit Beispielen, welche für die Soziale Arbeit relevant sind. Erfinde relevante Daten aus den Bereichen Sozialhilfe oder Kindes- und Erwachsenenschutz.
Kommentiere den Code Zeile-für-Zeile detailliert aus, so dass ich ihn genau verstehe.
Biete mir am Schluss weitere Übungen an, falls ich Lust habe. Mache Vorschläge für Übungen.
Du bist eine R-Expert:in, weisst aber auch, dass angehende Sozialarbeiter:in in Sachen Programmierung wenig Wissen haben und das nicht technische Begriffe eine alltagssprachliche Erklärung benötigen.
Danke für deine motivierte Unterstützung und Hilfsbereitschaft! Du hilfst mir R zu lernen und dieses Wissen für Klient:innen einzusetzen.
Wichtige Details:
Bitte lasse das «print()» weg, falls nicht nötig.
Ergänze bei Strg jeweils Ctrl, falls gewisse Personen englische Windows Tastaturen haben.
People learn basic data science tools.
People learn how to integrate data science in social work problem solving.
People learn how to do data science with R.
Term that emerged ca. 10 years ago. Predecessors: Statistics, Data analysis.
The science of creating valuable information from data
Practice-oriented science
Combines technical and field expertise
Data contains information on human behavior = helps us better understand the human world and solve human problems.
In the era of AI, “data literacy” becomes a key skill in all areas of life, including social work –> it should be a basic competence
Data awareness
Skills to interpret and analyze data
Think of a social work field
What is the goal of social work in that field: What aspects of your clients lives do you want to improve?
What existing data could you use to measure these aspects of your clients’ lives? Who owns the data? What specific information would you use to measure this? What are technical and ethical limitations?
Post your answers on this padlet
Structure
Einleitung: Vorstellung der Fragestellung und ihrer Relevanz für die Soziale Arbeit
Methodik: Dokumentation dessen, welche Daten verwendet und wie sie ausgewertet wurden
Resultate: Präsentation der Resultate
Schlussteil: Diskussion und Interpretation der Resultate mit Bezug zum Gegenstand und Auftrag der Sozialen Arbeit
Die Studierenden liefern zudem ein R-Code File mit, in welchem die Aufbereitungs- und Auswertungsschritte festgehalten sind. Das Code-File muss reproduzierbar sein und die verwendeten Resultate herstellen.
Der Kompetenznachweis (Dokumentation, R-Code) wird in Gruppen von 2-3 Personen verfasst, verfügt jedoch über individuell verantwortete Teile im Text oder im Codefile (z.B. im Text: Einleitung, Methodik, Resultate, Schlussteil; im Code: Aufbereitung und Auswertung). Die individuellen Beiträge sind am Ende der Dokumentation als solche auszuweisen (Angabe der Kapitel; für Code: Angabe der Zeilennummern).
R is free and open source.
R has excellent online documentation.
R has a very active user community (forums, blogs, etc.).
R is more than just statistical software.
R has interfaces to numerous other programs.
R is interdisciplinary.
R is gaining in importance and popularity! (see Popularity Statistics)
With RStudio, there is now a powerful tool for an easy and efficient workflow.
Advantages
Disadvantages
Which organizations are behind R?
RStudio Environment
First steps
getwd(): Displays the working directory.setwd(): Defines a new working directory.dir(): Displays the contents of the current working
directory.\ for path specifications, use
/ or \\ instead.a <- 10 or a = 10: Creates or
overwrites the object a with the content on the right
(10).a: Displays the content of the object
a.rm(): Deletes objects from the workspace.save(a, b, file = "example.RData"): Saves the specified
objects (a, b) in the current working directory.load("example.RData"): Loads all objects saved in the
specified file.#: Starts a commented line that is not
interpreted.ls(): Displays all objects in the current
workspace.# Comments start with #
# Everything in the line after # is ignored by R
5+5
getwd() # Display working directory
# Define working directory
# setwd("C:/some/path/")
dir() # Display working directory
a <- 50 # Creates object a (number vector of length 1) with the single value 50
a
# With c() - concatenate you can also build a number vector with several elements:
b <- c(1, 2, 3, 4)
# or shorter
b <- seq(1,4)
# or even shorter
b <- 1:4
# Create object containing the first names of the Beatles
the.beatles <- c("John", "Paul", "George", "Ringo")
the.beatles # compared to a, the object is now a string/character
# Object names must not have spaces. It is also recommended - and _ should be avoided (Google R Style Guide)
# Names should be meaningful. Naming should be consistent throughout the code file (dots, upper/lower case)
ls() # Display workspace
rm(die.beatles) # Delete object
Many objects (functions, data records, etc.) are located in so-called packages (see here)
Packages are written and maintained by countless voluntary authors. As a result, numerous methodological niches are well covered (especially in comparison to other statistical software packages)
Some packages are part of the core scope of R and are loaded by default when R is started. Other packages can be loaded on request.
Warning: Objects contained in packages (e.g. functions) may overlap by name (“function is masked”)
Functions in connection with packages:
library(packagename): load installed packagelibrary(): display all installed packageslibrary(help=packagename): Some package
informationsearch(): show currently loaded packagesdetach("package:packagename"): “unload” package
againls("package:packagename"): show all objects within a
packagepackagename::bar: load a single object from a package
instead of the whole packageinstall.packages("packagename"): install packageremove.packages("packagename"): uninstall package# Let's assume we want to read in an SPSS file:
??spss
# provides references to the functions read.spss and read_por on the packages foreign and haven
# Install package
# install.packages("foreign")
# Only works if the package is also loaded
library(foreign)
?read.spss
# But general recommendation: Google or AI now usually provide better results than the help function
+ - * / ^& | == != > < >= <=?function in each
case)
exp(x)=e^x log(x) log10(x) sin(x) cos(x) tan(x)abs(x) sqrt(x) ceiling(x) floor(x) trunc(x) round(x, digits=n)# Calculate
result <- (23+24)*11/(18+15)*5
result
# Functions
log(2)
cos(2)
# Comparison
x <- -3:3
x
# Are the elements of x equal to 0?
x == 0
# greater than 0?
x > 0
# less than 0?
x < 0
# greater than or equal to 0?
x >= 0
# less than or equal to 0?
x <= 0
# not equal to 0?
x != 0
# greater than -1 but less than 1
x > -1 & x < 1
# greater than 1 and less than -1
x > 1 & x < -1
# greater than 1 or less than -1
x > 1 | x < -1
# 1.
((3+4-5)-9)^2
log(1)
# 2.
5==7
sqrt(3)!=cos(17)
class(): Reveals the class of an object
numeric
logical (TRUE/FALSE)
Character/String
Some data types can be converted, e.g. as.numeric()
or as.character()
List, e.g. list(1, "Hello", TRUE)
Data frame: “list” of vectors of the same length
Factors: represent categorical data. These are stored as numerical values but are linked to a value label
# Integer vector
x <- c(1, 2, 3)
class(x)
x
# Logical vector
x <- -3:3
y <- x >= 0
y
class(y)
# String/character vector
x <- c("a", "b", "c")
class(x)
x
# list
list <- list(a= c(4:8), b = c("a", "b", "c"), c = c(TRUE, FALSE))
class(list)
list
# factors
sex <- c(0, 0, 1, 1)
factor(sex, labels=c("man", "woman"))
# Functions: e.g. cos(); mean()
class(mean)
mean
Inf and -Inf: Positive and negative
infiniteNaN: “Not a number”,
e.g. 0/0.NA: missing value (Missing)# Important note on missing values:
x <- c(1, 2, NA, 4)
#wrong:
x == NA
x == "NA"
#correct:
is.na(x)
data.frame(): creates a data frameas.data.frame(): converts to a data frameorder(): sorts datasummary() and str(): overview of data
frameshead() and tail(): inspect first/last
linesnames(): show column namesobject$var1: directly accesses the column
var1 in the data frame objectna.omit(): Row-by-row exclusion of missing values,
i.e. rows that contain at least 1 missing value# Prepared data are often data frames.
richtungswechsel <- read.csv("S:/MA1082973/_FHNW/BA472/data sets/Richtungswechsel/Richtungswechsel_anonymized data set.csv")
class(richtungswechsel)
# you can also easily build one yourself
beruf <- c("Lehrerin", "Verkäufer", "Pilotin")
nation <- c("CH", "DE", "IT")
id <- 1:3
df <- data.frame(id, beruf, nation)
df
# Addressing rows and column positions
df$nation
df[, "nation"]
df[3, "beruf"]
df[3, 3]
# How can I access specific elements of a vector directly?
x <- seq(2, 200, 2)
x
x[1] # first element of x
x[1:10] # the first 10 elements of x
# For two-dimensional objects, both rows and columns can be accessed:
# load Richtungswechsel data
richtungswechsel <- read.csv("S:/MA1082973/_FHNW/BA472/data sets/Richtungswechsel/Richtungswechsel_anonymized data set.csv")
richtungswechsel[1:2, c(3, 6)] # reads: "first to second row, third and sixth column"
# Eselsbrücke: Zeilen zuerst, Spalten später.
# Apart from the position, the name of a column (or row) can also be used for referencing:
richtungswechsel[, c("Geschlecht", "Staatsang.", "europe")]
# or by a condition:
richtungswechsel[richtungswechsel$Bezugsdauer > 4, ]
Richtungswechsel into R.Bezugsdauer and
Bildungsstand for all persons aged between 25 and 40.# 1.
richtungswechsel <- read.csv("S:/MA1082973/_FHNW/BA472/data sets/Richtungswechsel/Richtungswechsel_anonymized data set.csv")
# 2.
richtungswechsel[c(10, 12), ]
# 3.
richtungswechsel[richtungswechsel$Alter > 25 & richtungswechsel$Alter < 41, c("Bezugsdauer", "Bildungsstand")]
dplyr / data.tablemelt() dcast() from reshape2str_replace(), str_sub() from the
stringr packagetolower()tidyrrecode() from John Fox (package
car)# Wetterdaten
weather <- read.table("https://raw.githubusercontent.com/justmarkham/tidy-data/master/data/weather.txt", header=TRUE)
head(weather) # the variables are in rows and columns
# reshape the data (melt) and delete mssings values
library(reshape2) # for melt()/dcast()
weather1 <- melt(weather, id=c("id", "year", "month", "element"), na.rm=TRUE)
head(weather1)
# clean column for "day"
library(stringr) # for str_replace(), str_sub()
weather1$day <- as.integer(str_replace(weather1$variable, "d", ""))
# we do not need the "variable" column
weather1$variable <- NULL
# The element column contains two different variables tmin and tmax.
# These should be in two columns:
weather1$element <- tolower(weather1$element) # lowercase letters
weather.tidy <- dcast(weather1, ... ~ element) # reshapen to two columns
head(weather.tidy)
# the date can also be displayed in a column as a real date:
weather.tidy$date <- as.Date(paste(weather.tidy$year,
weather.tidy$month,
weather.tidy$day, sep="-"))
weather.tidy[, c("year", "month", "day")] <- NULL
head(weather.tidy)
Read the data set SHP into R. You can use this
command: read_sav(“your workingdirectory/SHPLONG_P_USER.sav”)
in haven package.
Look at the data structure and the variables of the data set.
Also use functions such as: summary(dataset$var),
head(dataset), names(dataset)
# load libraries & data
library(haven)
shp <- read_sav("S:/MA1082973/_FHNW/BA472/data sets/SHP/SHPLONG_P_USER.sav")
# data structure
head(shp)
names(shp)
summary(shp$AGE)
#Variant 2 to download the shp data: with this command you can download files if you have the direct link to the file
download.file("https://drive.switch.ch/index.php/s/02NutftoUqK4x9V/download", "shp2022.RData", mode = "wb")
#Because the file is in R-Data-Format (.RData) we can load it directly (adjust the working directory to the path where the data is stored)
load("S:/MA1082973/_FHNW/BA472/data sets/SHP/shp2022.RData")
table(x): one-dimensional contingency tabletable(x,y): two-dimensional contingency tableprop.table(table(x)): relative frequency# Let's define a new variable AGE that contains the age (in years) of ten people.
age <- c(76, 54, 38, 96, 32, 76, 81, 81, 50, 75)
# Let's take hosp to be a variable that contains the information if the same ten persons have been hospitalized in the last six months (1= yes, 0 = no)
hosp <- c(1, 0, 0, 0, 0, 1, 0, 0, 1, 0)
# Tabelle
table(age)
table(age, hosp)
# Tabelle in Prozent
100*prop.table(table(age))
100*prop.table(table(age, hosp))
mean(x): Meansd(x): Standard deviationvar(x): Variancemedian(x): Medianmin(x): minimummax(x): Maximum# Mean
mean(age)
sd(age)
# Median
median(age)
sort(age)
# Funktion summary
summary(age)
filter(): selects a subset of rows (see also
slice())arrange(): sortsselect(): selects columnsmutate(): creates new columnssummarize(): aggregates (collapses) data to individual
data pointsdistinct(): removes duplicate valuesgroup_by(): defines subgroups in the data so that
mutate() and summarize() can be applied
separately per group.%>%,
which makes the code much easier to read and more compact.# load data
library(dplyr)
shp2022
# filter by 1st nationality not Switzerland and persons up to 65 years old
# select columns
shp2022a <- shp2022 %>%
filter("NAT_1_" != 8100, AGE < 66) %>%
select(c("AGE", "SEX", "NAT_1_", "EDUCAT"))
# create new variable "Tertiary education"
shp2022a <- shp2022a %>%
mutate(EduTertiary = EDUCAT == 10)
# Count by gender and with a tertiary education (TRUE/FALSE)
tab <- shp2022a %>%
group_by(SEX, EduTertiary) %>%
summarise(n=n()) %>%
arrange(SEX, EduTertiary) %>%
na.omit()
Read the SHP data into RStudio.
Restrict the data set for the year 2022 to people who are 25
years or older. Familiarize yourself a little with the data
(e.g. head(), summary(),
table()).
Look at the variables with the information on age (variable AGE),
gender (variable SEX), years of education (variable EDYEAR) and first
nationality (NAT_1_).
Create a crosstab with the variables EDYEAR and
SEX.
calculate mean and standard deviation for the variable age for men and women.
cov(): covariancecor(): Correlationcor(x,y,method="spearman"): Rank correlation?cor: more information in the helpfilechisq.test(): Chi-square testt.test(): t-testWhy is data visualization important?
Data exploration vs. data presentation
Simple diagrams
Histograms
Scatter charts
Bar charts
Hadley Wickham’s ggplot2 package has developed into a
particularly useful alternative to plot() over the last few
years. Especially complicated plots are easier to implement with ggplot,
visually appealing and the code is easily accessible. Most important
basic structures:
Data that we want to visualize
Geometries to define the shapes we want to use for visualization (e.g. a scatter plot, line chart, bar chart)
Modify aesthetics to convey different meanings (e.g. colors, size, thickness of a line)
Define mappings between geometries and aesthetics (e.g. how big should the data points be)
library(ggplot2)
## Warning: Paket 'ggplot2' wurde unter R Version 4.1.3 erstellt
library(dplyr)
## Warning: Paket 'dplyr' wurde unter R Version 4.1.3 erstellt
##
## Attache Paket: 'dplyr'
## Die folgenden Objekte sind maskiert von 'package:stats':
##
## filter, lag
## Die folgenden Objekte sind maskiert von 'package:base':
##
## intersect, setdiff, setequal, union
library(haven)
## Warning: Paket 'haven' wurde unter R Version 4.1.3 erstellt
# set working directory
setwd("S:/MA1082973/_FHNW/BA472")
# load data
shp <- read_sav("data sets/SHP/SHPLONG_P_USER.sav")
# Scatterplot (einfach)
shp2022 <- shp %>%
filter(YEAR==2022) %>% filter(AGE > 24)
ggplot(data=shp2022, aes(x = PC46, y =PC45)) +
geom_point() +
labs(title = "Streudiagramm",
x = "Weight in kg",
y = "Height in cm")
## Warning: Removed 3083 rows containing missing values (`geom_point()`).
#
ggplot(data=shp2022, aes(x = PC46, y =PC45, color=factor(SEX))) +
geom_point() +
labs(title = "Streudiagramm",
x = "Weight in kg",
y = "Height in cm",
color = "Geschlecht") +
scale_color_manual(values=c("lightgreen", "darkviolet", "red"),
labels=c("Männlich", "Weiblich", "Andere"))
## Warning: Removed 3083 rows containing missing values (`geom_point()`).
# Erstelle ein Säulendiagramm
# Entfernen der NA-Werte für die Variable SEX
shp2022 <- shp2022 %>% filter(!is.na(SEX)) %>% filter(!is.na(PC44))
ggplot(shp2022, aes(x=factor(PC44), y=..count.., fill=factor(SEX))) +
geom_bar(stat="count", position="dodge") +
labs(x="Zufriedenheit mit dem Leben", y="Anzahl", fill="Geschlecht") +
#Beschriftung für Legende
scale_fill_manual(values=c("lightgreen", "violet", "red"),
labels=c("Männlich", "Weiblich", "Andere")) +
ggtitle("Zufriedenheit mit dem Leben nach Geschlecht")
## Warning: The dot-dot notation (`..count..`) was deprecated in ggplot2 3.4.0.
## i Please use `after_stat(count)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
labs() argument to the
chart.data$variable <- as.factor(data$variable)). Also
restrict the income (exclude outliers, e.g. all incomes over
200000).Input() from the user goes to the
R-server, and the R-server sends Output() back.library(shiny)ui<-fluidPage() with input and output functions.
Possible input and output functions can be found hereserver<-function(input,output){}shinyApp(ui= ui, server=server) (if ui and server are
in one script)# Ensure shiny and ggplot2 are installed
# install.packages("shiny")
# install.packages("ggplot2")
library(shiny)
library(ggplot2)
# Example data: a simple dataset related to social work
data <- data.frame(
age = sample(18:65, 100, replace = TRUE), # Ages between 18 and 65
satisfaction = sample(1:10, 100, replace = TRUE) # Satisfaction levels from 1 to 10
)
# Define the UI
ui <- fluidPage(
titlePanel("Data Visualization for Social Work"),
sidebarLayout(
sidebarPanel(
sliderInput("ageRange",
"Select Age Range:",
min = min(data$age),
max = max(data$age),
value = c(25, 40)),
sliderInput("satisfactionRange",
"Select Satisfaction Range:",
min = min(data$satisfaction),
max = max(data$satisfaction),
value = c(4, 7))
),
mainPanel(
textOutput("countOutput"),
plotOutput("ageDistributionPlot") # Add this line to output the plot
)
)
)
# Define server logic
server <- function(input, output) {
filteredData <- reactive({
data[data$age >= input$ageRange[1] & data$age <= input$ageRange[2] &
data$satisfaction >= input$satisfactionRange[1] & data$satisfaction <= input$satisfactionRange[2], ]
})
output$countOutput <- renderText({
paste("Number of cases within selected range:", nrow(filteredData()))
})
output$ageDistributionPlot <- renderPlot({
ggplot(filteredData(), aes(x = age)) +
geom_histogram(binwidth = 5, fill = "skyblue", color = "black") +
theme_minimal() +
labs(title = "Age Distribution of Selected Cases",
x = "Age",
y = "Frequency")
})
}
# Run the app
shinyApp(ui = ui, server = server)